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Abstract 

Computational prediction of protein structures is a difficult task, which involves fast and accurate 
evaluation of candidate model structures. We propose to enhance single model quality assessment with 
a functionality evaluation phase for proteins whose quantitative functional characteristics are known. 
In particular, this idea can be applied to evaluation of structural models of ion channels, whose main 
function - conducting ions - can be quantitatively measured with the patch-clamp technique providing 
the current-voltage characteristics. The study was performed on a set of KcsA channel models obtained 
from complete and incomplete contact maps. A fast continuous electrodiffusion model was used for 
calculating the current-voltage characteristics of structural models. We found that the computed charge 
selectivity and total current were sensitive to structural and electrostatic quality of models. In practical 
terms, we show that evaluating predicted conductance values is an appropriate method to eliminate 
modes with an occluded pore or with multiple erroneously created pores. Moreover, filtering models on 
the basis of their predicted charge selectivity results in a substantial enrichment of the candidate set in 
highly accurate models. In addition to being a proof of the concept, our function-oriented single model 
quality assessment tool can be directly applied for evaluation of structural models of strongly-selective 
protein channels. Finally, our work raises an important question whether a computational validation of 
functionality should not be included in the evaluation process of structural models, whenever possible. 
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1 Background 

Currently there are over 48 million of protein sequences stored in the resources of the Uniprot Consortium, 
while only 109 000 structures are deposited in the Protein Data Bank (PDB) [1], and the gap is constantly 
increasing. Computational methods for protein structure prediction are believed to be able to solve this 
problem. These methods may help to identify and counteract causes of various pathological processes 
through computational drug design [2113, drug target identification g], and protein design [5]. 

Computational prediction of protein structures is a difficult task, which also involves fast and accurate 
evaluation of candidate model structures. The ultimate verification of quality of a protein model requires 
availability of the native structure, or at least of its close homologs. Typically, the assessment is based on 
deviations between positions of equivalent atoms in the native protein structure and in the assessed model. 
Classical methods include the Root Mean Square Deviation (RMSD) or Global Distance Test (GDT) used 
in the Critical Assessment of protein Structure Prediction competition (GASP) g]. There are also other 
methods, which express structural dissimilarity between structures, combining global and local measures or 
considering only some of distances [71[i[l[Tni[IIl[Tl[T3]. 

In real life situations native structures are often not attainable, which makes model evaluation a challenge. 
To resolve it, numerous Model Quality Assessment Programs (MQAPs), which estimate the quality of 
produced models and select the best predictions, have been proposed. MQAPs can be divided into three 
main groups: single-model, quasi single-model, and consensus methods. Consensus methods (also known as 
clustering methods) rank models in an ensemble in order to provide relative quality scores [1411151 ITB] . Quasi 
single-model class include methods which evaluate a model against structural templates [niiis]. Finally, 
single-model methods (often referred to as true MQAPs) predict similarity between a single model and 
the unknown native structure based on a wide range of structure- and sequence-based features of assessed 
models, such as solvent accessible area, secondary structure, residue and atom contact maps, evolutionary 
information, statistical potentials [B HOI HI]. The CASPIO experiment showed that consensus MQAPs 
outperformed single and quasi single-model methods in case of easy and moderate targets, however in case 
of difficult, free modeling targets without known homologs, the chances were even. One of the main reasons 
for developing new single and quasi single-model methods is that the consensus methods are unable to detect 
low quality models if the whole ensemble of models consists solely of low quality structures. Moreover the 
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ability of consensus methods to select the best models in groups of similar structures is limited. 

In this work we propose to enhance single model assessment with a functionality evaluation phase for 
proteins whose quantitative functional characteristics are known. This approach can yield useful knowledge 
showing whether a protein model is functionally correct, which is complementary to the typical assessment 
based on structural features. The main difficulty is efficient measuring and modeling the functionality. It 
needs to be an experimentally measurable property that is sensitive to structural details of a molecule. At 
the same time, a modeling method needs to be fast enough to efficiently score hundreds of structural models. 

Here, we apply this idea to evaluation of structural models of ion channels. The main function of these 
proteins is conducting ions, which can be quantitatively measured with the patch-clamp technique providing 
current-voltage (I-V) characteristics of a single channel [52]. Thus, current-volt age characteristics can be 
used as a benchmark functionality for the structural model assessment. In principle, calculation of complete 
I-V curve resulting from a model structure can be performed with Molecular Dynamics (MD; [231 [51]1. 
which treats the pore and ions in a fully discrete way, or with Brownian Dynamics (BD; [2311231)? which 
treats the pore and the solute in a continuous manner and the ions discretely. However, both methods 
are computationally expensive and thus slow. Especially MD is inappropriate for prediction of the current. 
The alternative is the 3-Dimensional Poisson-Nernst-Planck flow model (3D PNP), a continuous steady- 
state theory, in which ions are represented by their position-dependent average concentrations [271I2E1I29]. 
3D PNP is less accurate than MD and BD methods but manyfold faster, typically 3-5 CPU minutes for 
one channel structure [30]. While, due to its simplicity, the classical 3D PNP is generally not suitable to 
model complex physical phenomena, it has been shown to be capable of accounting for effects of single point 
mutations and of predicting I-V characteristics of the quality sufficiently good for a MQAP [SH [30] . 

In this study, we apply the computationally enhanced 3D PNP model [23] as a function-oriented single¬ 
model MQAP on a set of structural models of the KcsA ion channel. First, models of diverse quality 
are obtained from complete and incomplete contact maps. Then, relations between channel structural and 
functional features are investigated. Finally, the predictive power of selected functional characteristics is 
assessed. 

2 Materials and Methods 

KcsA is a relatively well-studied potassium channel for which experimentally solved structure in the 
open-conductive configuration is available in the PDB under accession number 3FB8. It is relatively small - 
its transmembrane domain consists of 4 identical units of 87 amino acids each. Patch-clamp measurements 
at ±100 mV revealed relatively high conductance from 57 to 75 pS, mild outward rectification (1.29) and 
inhnite cation to anion selectivity [32] (Tab.[^. 
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In order to generate a set of models of diverse quality, the experimental structure 3FB8 was reduced 
to contact maps of information completeness varying from 30% to 100%. Then, spatial coordinates were 
reconstructed from the contact maps using C2S-pipeline, which applies several state-of-the-art bioinformatic 
tools [33j . Structural quality of a reconstructed model was measured using overall and single amino acid 
RMSD related to the original PDB structure, the diameter of entrance to the selectivity filter (SF) and 
deviation of oxygen atoms in SF. 

The electrostatics of channel models was calculated using the Poisson-Boltzmann method and the ion 
flux was computed using the classical 3D PNP model. The current-voltage characteristics were quantified at 
external voltage of ±100 mV using plain values and absolute deviations of the inward and outward current 
(or equivalent conductances), inward and outward charge selectivity (i.e. ratio of cation to anion current) 
and rectification of the current. When applied to the reference structure, the electrodiffusion model properly 
predicted outward rectification of the channel and virtually infinite cation to anion selectivity (above 100:1) 
while total currents were underestimated 3-4-fold (Tab.[^ [50] . 

2.1 Computational pipeline 

Our in-house software was used to generate a contact map (CMAP) based on the PDB file. A CMAP was 
a square matrix of -1, 0 and 1. A pair of residues was assumed to be in contact if Ca atoms of both residues 
were within 12 A of one another. This distance was previously reported as the optimal contact distance 
for a CMAP-based protein reconstruction |51|. Remaining pairs were attributed a status of non-contact in 
the CMAP. In order to obtain models of different qualities CMAPs reduced to 90%, 70% 50%, and 30% of 
information were also generated. CMAP reduction was conducted by substituting the specified percentage 
of randomly selected contacts and the same percentage of non-contacts with the status of “unknown”. The 
selection was conducted with the uniform distribution, therefore equal portions of information on contact 
sites were lost in all parts of the structure. 


Spatial coordinates of a channel were reconstructed from the contact map in a three step procedure 
C2S-pipeline, which applied several state-of-the-art bioinformatic tools [33] . Coordinates of Ca atoms were 
estimated based on constraints imposed by the contact map using FT-COMAR |35l|36|. The protein back¬ 
bone was reconstructed by S ABB AC |37| and side-chains were added using SCWRL [38]. The protocol was 
adapted for modeling multimeric symmetric proteins (see [55|). The structural quality of constructed models 
was measured using the following features: 
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• Full model RMSD related to the original PDB structure, 


• Model Cq-C /3 RMSD related to the original PDB structure, 

• RMSD of each model amino acid related to the original structure, 

• Diameter of the selectivity filter (SF), 

• Deviation of the selectivity filter oxygen atom, related to the original structure. 


Two types of functional characteristics were calculated for each reconstructed protein structure: the 
electrostatic profile at the pore axis, and the current-voltage characteristics. The channel was fitted in 
the 129x129x129 grid at the 1 A resolution for the Poisson-Boltzman calculations using Adaptive Poisson- 
Boltzmann Solver (APBS; [39]). Electrostatic profiles were obtained in absence of ions and at no external 
voltage. Correctness of the electrostatic profile was quantified using the Root Mean Square Error (RMSE) 
in reference to the profile calculated for the original channel. 

Current-voltage characteristics were determined with 3D PNP Solver using the grids obtained from APBS. 
The dielectric constants were assumed as e = 4 for the protein and e = 80 for the solute. PNP calculations 
were carried under parametrization optimized for narrow channels (see [30j 1. including grid spacing A = 2 A, 
partition coefficient ^ = 0.4, dielectric constant in the pore e = 40 and sphere unified model for determining 
pore-radius dependent diffusion coefficient. Computational results obtained from 3D PNP Solver on the 
native channel structure were used as the reference characteristics for assessment of predicted models. The 
current-voltage characteristics were quantihed at external voltage of ±100 mV using the following functional 
features: 


• Currents 

— inward and outward cationic currents 

— inward and outward anionic currents {Ifin ^ out)^ 

— inward and outward (total) currents {Iin,Iout)] 

• Inward and outward charge selectivities (i.e. ratio of cation to anion current: itn/^fii^ ^out! 
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• Rectifications of 


- cationic current (|/lt//+|), 

- anionic current !Iln\)^ 

- (total) current {\Iout/Iin\)- 


Note that currents can be easily converted to conductance: 

G = \I/Vl 

where V is the electric potential applied to the membrane. The equivalence of the current and conductance 
is often utilized in the following of the document. 

In addition to the plain values of the currents, selectivities and rectification, their deviations from the 
current, selectivity and rectihcation - calculated for the original protein structure - were also calculated. The 
deviation of current was calculated as a difference: 


AI= L 


model ^reference! 


The deviation of charge selectivity, and the deviation of rectification were calculated as a natural logarithm 
of a quotient: 


A(/+//-) 


In 


IK 


^modeli ^model 


^reference/^reference 


^{^out/^in) — 


In^ 


\i> 


out-.model /^in-.model 


^out‘.reference / ^in-.ref erence \ 


Note that wherever the term “deviation” is used throughout this document, it always refers to the 
absolute deviation. 

Dependencies between structural and functional features were evaluated in terms of Kendall’s r rank 
correlation coefficient [40]. 

Datasets with calculated values of functional and structural features, and with Kendall’s r and p-values 
for their correlations, are available as supplemental data (see Supplemental information 1). 
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2.2 Criteria of functional validity 

Current-voltage characteristics obtained for the KcsA open-state conducting structure (PDB: 3FB8) using 
3D PNP Solver in our previous work |3()j were used to determine criteria for functional quality assessment 
of predicted KcsA structures. As the 3D PNP is a semi-quantitative model, thresholds of the functional 
features should not be too conservative. In this study the following cutoffs were applied: 

• Total inward and outward conductance at ±100 mV: 

Gin ; Gout > 10 pS, 

which is equal to the following condition for the total inward and outward current at ±100 mV: 

I An 15 lAutl ^ 1 pA. 

Note that this threshold corresponds to roughly 1/2 of the computational inward conductance and 2/3 
of the computational outward conductance of the original KcsA structure 3FB8 [30) . 

• Inward and outward cation to anion selectivity ratio were arbitrary set to: 


G+/G- = I+/I- > 10 : 1 or 50 : 1, 
Gtut/Gout = I tut/hut > 10 : 1 or 50 : 1 


• Outward rectification at ±100 mV: 


Gout/Gi: 


\hut/^in\ ± 1 . 0 . 


The above defined thresholds provide a intuitive notion of functionally admissible model-structures. In 
addition we assume that a predicted model is conducting when its calculated inward and outward conductance 
are both within the range of 1 pS and 1 nS. The value of 1 pS is often regarded as the bottom threshold 
for ionic channels [41]. We also found that the conductance above 1 nS is an indicator of a porous, leaky 
protein structure (i.e. a structure with multiple erroneously created pores). 
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2.3 Predictive power of functional characteristics 


The functional features were assessed in terms of their ability to select models that are structurally closest 
to the native protein. For this purpose, functionally correct models were regarded as properly classified only 
if their general Ca-Gp RMSD (or RMSE of the electrostatic profile) was below a selected threshold. Quality 
of binary classification at particular threshold was evaluated in terms of Sensitivity {Sn), Specificity {Sp). 


Sn = 


TP 

TP + FN 


Sp 


TN 

TN PFP 


where TP denotes the True Positive rate, which expressed the rate of functionally correct models that were 
also structurally correct (i.e. below a selected RMSD or RMSE threshold); TN is a True Negative rate 
with functionally incorrect models that were also structurally incorrect; FP is the False Positive rate with 
functionally correct models which were structurally incorrect; FN is the False Negative rate with functionally 
incorrect models that were structurally correct. Additionally, Matthew’s correlation coefficient {MCC) and 
Accuracy (ACC) were also calculated: 


MCC = 


TP-TN - FP ■ FN 

y/{TP + FP){TP + FN){TN + FP){TN + FN) ’ 


= -. 

TP + TN + FP + FN 

Overall performance of classification at various thresholds was analyzed using the Area Under Receiver 
Operating Characteristic curve (AUROC) [42] . 

The TOPlOO sets consisted of 100 models which had the lowest deviations (general Ga-Gp RMSD, 
electrostatic profile RMSE) or the highest plain values (inward selectivity and outward selectivity) of each 
feature. In case of the RMSD-based ranking, models 98th to 107th had exactly the same quality and were 
all included in the TOPlOO. In addition to the simple rankings, two joint rankings (RMSD & RMSE, and 
inward & outward selectivity) were generated such that both simple rankings were extended to n models 
until their cross-section counted 100 models. 






3 Results and Discussion 


3.1 Relation between structural and functional features 

Full contact map set 

In the first experiment, structures of the KcsA channel were reconstructed based on the full contact map. 
The total of 430 structural models were generated, 343 of them were conducting^ i.e. achieved predicted 
conductance within the range of 1 pS and 1 nS. 

All the candidate models were structurally correct as their full atom RMSD to the original PDB structure 
was between 2 and 2.8 A. However, in terms of functionality, only 29% of models achieved cation/anion selec¬ 
tivity of 50:1 in both directions, 38% exhibited correct outward rectification, and 77% achieved conductance 
of 10 pS in both directions. The three functional criteria were fulfilled together by only 37 models, which 
was roughly 10% of the whole set. 

To gain more insight, Kendall’s r coefficients were calculated between structural and functional features. 
The general full atom RMSD of models correlated significantly with deviation of the inward anionic current 
A/j“ (Tab. SI). Moreover, the deviation of functional features depended on amino acids around selectivity 
filter, as expected (see Fig. UK)- The strongest association was a positive correlation between rectification 
\Iout/hn\ and the pore diameter at THR75, at the intracellular entrance to the selectivity filter (p-value ~ le- 
10, Fig .[^). Other highly significant correlations included the RMSD of THR75 and deviation of rectification 
A(/oui/7m) and between the RMSD of PR083 and deviation of the inward anionic current . 

Reduced contact map sets 

In the second experiment, protein models were generated from four randomly reduced contact map sets 
characterized by different information completeness: 90%, 70%, 50% and 30%. Over 4/5 of all models 
achieved full atom RMSD below 4 A, including all models rebuilt from maps containing 70% or more contact 
information (Tab. S2). However, this high RMSD threshold was reached only by 1.4% of models obtained 
from 30%-complete maps. In addition, the full atom RMSD of 2/5 of all models was below 2.5 A. Median 
Cq-C /3 RMSD ranged from very good, i.e. 0.76 A for full contact maps, to poor, i.e. 6 A for 30%-complete 
maps (Fig. and Tab. S2). Similar pattern was observed by the full atom RMSD (from 2.39 A to 6.9 A, 
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respectively, Fig. (inset) and Tab. S2). 

Functionally, the inward and outward conductance was within the range of IpS to 1 nS {conducting 
models) for 1687 (78%) models, and exceeded 10 pS in 72% conducting models (Tab. [^. The outward 
direction of rectification \Iout\/\Iin\ > 1 was obtained for 41-51% models, depending on the contact map 
completeness. The median value of rectification oscillated between 0.9 and 1.0 (Fig. ii) and typically was 
significantly below the level of 1.39 calculated for the reference structure. The inward and outward selectivity 
above 10:1 was reached by only 26% predicted KcsA structures (Tab. [^, a few models reached the inward 
selectivity level of the original structure (181:1), despite relatively high randomness (Fig. Selectivity over 
50:1 was obtained for just 11% models. Proportion of highly selective models decreased dramatically with 
reduced information in the map, for example only 2 out of 250 structures from the 30%-complete maps had 
selectivity higher than 10:1 in comparison to 193 out of 343 structures from the full map. All the functional 
criteria including the selectivity above 10:1, were collectively fulfilled by 9% of models (almost a half of them 
were from the full contact maps). Only half of them exhibited selectivity above 50:1. No structure obtained 
from the 30%-complete maps met all the functional criteria. 

General RMSDs (Cq,-C /3 and full atom) and deviation of charge selectivity were the most and second 
most correlated pairs of model features, in terms of Kendall’s r (Tab. S3 and Fig. SI). Deviation of the 
inward current was the third most correlated functional feature (0.21-0.24), while correlation of deviation of 
the outward current was much weaker (0.12-0.15), yet still statistically significant. Interestingly, correlation 
of deviation of the rectification with deviation of any structural feature never exceeded range of r between 
-0.09 and 0.07. 

Discussion. 

Significant correlations between structural RMSD, and functional deviations of the charge selectivity and 
the total current (Tab. S3), support the hypothesis that predicted structural models could be validated on 
the basis of their calculated functional features related to experimental data. Deviation of the anionic current 
(experimentally equal to zero) was typically even more highly correlated with structural features than the 
charge selectivity (r higher up to 0.40, see Fig. S2), consistently with the result based on the full contact 
maps (Tab. SI). However, as experimental studies usually do not report the anionic current, further analyzes 
would focus on the selectivity. Interestingly, a significant difference in selectivity between 90%-complete and 
100%-complete map models suggests that this feature can be used to distinguish between good and very 
good models. The rectihcation could be, perhaps, more useful for fine tuning of the structure, as suggested 
by its relatively high correlations with some structural features in the dataset based on the full contact maps 
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only (Tab. SI). However, it could be also that 3D PNP Solver is least suited to correctly predict rectification 
(see ISnj). 

The total current typically increased with decreasing completeness of the map (Fig. [!>)■ This suggested 
a tendency of the reconstruction pipeline to produce sparser models (with a larger pore diameter) when 
information in the contact map was reduced. A larger pore diameter could also explain why the median 
cation to anion selectivity was of an order weaker for structures built from the 30%-complete maps than from 
the full maps (Fig. Indeed, the cation to anion selectivity in the classical electrodiffusion model applied 
by 3D PNP Solver is a result of presence of negative charges in the selectivity filter and its surroundings, 
which prevents passage of negatively charged ions. The effect decreases when the selectivity filter diameter 
is larger than the grid resolution (here: 2 A), as in such case, the pore is represented by two or more 
computational cells in the grid and therefore negative charges in the protein wall are partially shielded by 
positive charges in the solution. Thus, the classical electrodiffusion model performs better when the pore 
intersection is represented by only one computational cell - in this case the model is consistent with the 
single-file ion diffusion through the selectivity filter [43l |44] . 

The six residues, which exhibited the highest structure-function correlations are shown in Fig. GLY79 
is situated at the extracellular entrance to the selectivity filter (SF). Therefore, any change in its position 
or conformation is likely to result in a deviation of the ion flux. ASP80 (not shown, r > 0.30 only for the 
oxygen position) takes part in the transformation of SF to the non-conductive conformation [45]. GLY104 
is located in another region of the protein important for inactivation kinetics. It is adjacent to PHE103, 
which has been recently reported to act as an interface between the inner helical bundle and SF |45|. It is 
possible that the RMSD measure is more sensitive to deviation of atom positions in the small GLY104 than 
in the large PHE103. Prolines are typically structurally important elements of a protein. Indeed, PR083 is 
conserved among several channels, moreover its position is next to TYR82, another large functionally 
important residue (see supplementary information). PR063, while quite away from SF, is adjacent to 
large ARG64, one of the residues crucial for the inactivation event [1^. A functional role of GLY88 remains 
unknown, however mutations at this position were linked to disruption of tetramerization m- 

3.2 Predictive power of functional characteristics 

Model classification using functional features 

In this section, we examine if computed functional characteristics of the ion flux can be used to dis¬ 
criminate between structurally correct and incorrect models. In a systematic analysis, we tested various 
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thresholds for four functional features: deviations of inward and outward current, and deviations of inward 
and outward selectivity. The results formed the basis in a search for the optimal discriminative values which 
would allow for the most reliable model classification in relation to the ground truth given by the general 
Cq-C /3 RMSD. Only 1674 conducting models were considered. The Ca-Cp RMSD thresholds were fixed at 
values of 1 A (highly accurate) and 3 A (correct). The optimal thresholds were selected according to the 
maximum product of sensitivity {Sn) and specificity {Sp). Classification quality was evaluated also in terms 
of accuracy {ACC) and Matthew’s correlation coefficient (MCC) at the optimal threshold, and in terms of 
the area under the ROC curve (AUROC) as a summary measure over all thresholds (Tabj^. 

The classification based on deviation of the inward sefectivity produced ROC curves which were weii 
above the diagonal for both RMSD thresholds {AUROC 0.72 — 0.78) and shifted towards specificity (Fig.|^. 
The classification based on the current deviation resulted in similar AUROC for the RMSD of 3 A, while 
it was lower for the 2 A threshold {AUROC 0.59 — 0.65). In this cases the ROC curve was shifted towards 
sensitivity. The deviation of selectivity displayed the best balance between retaining good quality models 
and rejecting structurally incorrect models {MCC 0.38 — 0.39). Overall performance of the classification was 
better for higher RMSD thresholds. The optimal thresholds were relatively lower for the outward direction of 
selectivity (by 14-18%), and for the inward direction of current (by 8-20%). In practical terms, applying the 
optimal thresholds of selectivity retained ca. 70% of structurally correct models (518-548 out of 752 models 
with RMSD < 1 A, and 940-1015 out of 1412 models with RMSD < 3 A) at the cost of retaining 15-33% of 
structurally incorrect models in the group of functionally correct models (288-313 out of 935 models with 
RMSD > lA, and 42-69 out of 275 models with RMSD > 3 A). Using the optimal thresholds of current 
deviations as a classifier resulted in retaining 71-82% of correct models and 41-66% of incorrect models. 

Electrostatic RMSE as a complementary ground truth 

In section [j we reported that a significant proportion of models were non-conducting while having a 
relatively low general RMSD. This raised doubts whether general RMSD was an appropriate solitary measure 
of the channel structure quality. Therefore, we propose the electrostatic potential profile in the native channel 
structure as a new ground truth (Fig. [^), and its Root Mean Square Error (RMSE) as an alternative to 
the entirely structure-based RMSD. It can be argued that the RMSE of the electrostatic potential profile 
is a measure that balances the structural and functional quality of the channel model as the electrostatic 
potential profile is determined by the structure, and determines the channel function at the same time. 
While the electrostatic potential profile cannot be measured experimentally, it could be used to assess the 
relationship between structural and functional quality, and to evaluate the predictive power of calculated 
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current-voltage characteristics. 

First, we established the relation between the structural RMSD and the electrostatic RMSE threshold 
(Fig. [^). We found that the electrostatic profile RMSE and the general C^-C^ RMSD were generally 
well correlated. (Kendall’s r = 0.45). However, the relation was much weaker for low RMSD structures 
(t = 0.12 for RMSD < 1.7 A). In this group consisting of the most accurate models the two characteristics 
were complementary to each other. 

Next, we searched for the optimal thresholds for the four functional features (the inward and outward, 
current and selectivity deviations) to obtain the most reliable classification in terms of specificity and sen¬ 
sitivity product, related to the electrostatic RMSE at fixed thresholds of 0.3, 0.4 and 0.5 V. Again, only the 
conducting models were considered. Generally, classification characteristics were similar as in the case of 
classification in relation to the structural RMSD (Fig. [^and Tab. S4). Interestingly, overall performance of 
the classification was more sensitive to changing the threshold of the ground truth than in the RMSD-related 
experiment. This finding is consistent with presumably closer relationship between the current-voltage char¬ 
acteristics and the electrostatic profile. 

Practical scenarios 

In this section, two practical scenarios of our model quality assessment approach are analyzed. In order 
to emulate the-real-life use cases plain values of functional features were used instead of deviations related 
to the true structure. 

In the first scenario, the goal was to reduce a collection of candidate structural models. They were 
subjected to criteria of functional correctness in reference to available experimental current-voltage charac¬ 
teristics. The criteria were rather liberal, keeping in mind the semi-quantitative character of the 3D PNP 
model. We checked how applying intuitive functional conditions (defined in Sec |2.2[ ) reduces the dataset and 
enriches it in structures with low general Ca-C/jRMSD and profile RMSE (Tab.|^top). 

None of conductance conditions improved quality of the resulting subset in comparison to the initial 
dataset (Fig.[^). Unlike that, enrichment in high quality candidates due to the selectivity criteria was 
substantial (Fig. Virtually all structures with selectivity ratio above 10:1 were within RMSD< 3 A and 
RMSE<0.5 V from the real structure in comparison to ca. 80% in the whole population. Moreover, fraction 
of highly accurate structures increased from 44% to 75% (RMSD< 1 A) or from 38% to 57% (RMSE<0.3 V). 
The improvement was even more pronounced with the more stringent selectivity criteria (increase of highly 
accurate fraction by almost 90%). Median RMSD and RMSE were reduced by 20-30% and all exceptionally 
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wrong models were filtered out (no RMSD/RMSE was above 5.44 A/0.58 V for SIO or above 2.61 A/0.45 V 
for S50, Fig. Finally, adding the outward rectification condition slightly worsened the candidate set in 
terms of enrichment in structurally and electrostatically accurate models (Fig.[^). 


In the second scenario, the goal was to select the best 100 candidate models (TOPlOO). We checked 
to what extent the 100 best models in terms of the cation to anion selectivity overlap with the 100 best 
models in terms of the general Cq,-C^ RMSD and/or the profile RMSF (see Tab. [^bottom). Models were 
ranked separately in four simple categories: general Ca-Cp RMSD, profile RMSF, inward selectivity, outward 
selectivity, and in two joint categories: RMSD & RMSE and inward & outward selectivity (See Methods). In 
TOP100[RMSD] the RMSD ranged from 0.717 to 0.749 A, in TOP100[RMSE] the RMSE ranged from 0.098 
to 0.171V, and in TOP100[RMSD&RMSE] the RMSD ranged from 0.717 to 0.774 A and the RMSE ranged 
from 0.121 to 0.232 V. In TOPlOO [^] the inward selectivity ranged from 70:1 to 192:1, , in TOP100[^|^?*] 

-‘in ‘out 

1+ j + 

the outward selectivity ranged from 108:1 to 518:1, and in TOP100[^&yir^] the inward selectivity ranged 

‘in ‘out 

from 62:1 to 185:1 and the outward selectivity ranged from 89:1 to 518:1. 

Probability of finding a TOPlOO model from the ground-truth-hased ranking by chance was less than 
5%. The odds increased drastically when only the 100 most cation selective models were considered. Most 
notably, the TOPlOO according to outward selectivity included 33 out of the best 100 models in terms of 
the joint RMSD and RMSE criterion (7 times better than random). Enrichment in the TOPlOO based on 
the inward selectivity was weaker but still significant (from 1.9 times for TOP100[RMSD] to 4.1 times for 
TOP100[RMSD&RMSE]). Enrichment in the TOPlOO based on the joint ranking of inward and outward 
selectivity ranged from 2.6 to 4.8 times, depending on the ground truth. 


Discussion 

Fairly good AUROC values for classification based on the selectivity and current deviations showed that 
the features are sensitive to structural and electrostatic quality of models and therefore are suitable for 
separating models with low and high structural RMSD or electrostatic profile RMSE. However, ranges of 
current defined by the optimal deviation thresholds were below experimental values of the current (Tab. [^and 
Tab.|^. In addition, we found that liberal thresholds of current, taking into account approximate accuracy 
of the classical 3D PNP model, could not be effectively used to filter out structurally or electrostatically 
inaccurate models. Consequently, with the classical electrodiffusion model, the current-based criterion can 
be employed only for eliminating models with an occluded pore or with multiple erroneously created pores. 
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The optimal thresholds for selectivity deviation translate to selectivity cutoff ranging from 2.3:1 (inward 
selectivity, RMSD<3 A) to from 4.1:1 (outward selectivity, RMSD<1 A). These cutoff were severely underes¬ 
timated in reference to experimental data (no anionic current) and to computational results for the original 
structure (Tab.[^. However, due to very good specificity of the selectivity-based classification (i.e. retrieving 
ca. 20% of structurally accurate models with a few false positives, Fig.|^d), the condition of high selectiv¬ 
ity (above 10:1) proved to be practical for model quality assessment (Tab.and Fig.[^. While it requires 
further studies to verify if the semi-quantitative accuracy of the classical 3D PNP in predicting selectivity is 
sufficient for assessment of candidate models of mildly-selective channels (such as alpha-hemolysin, GLIC, 
etc.), the present study showed that the method is appropriate for the class of strongly-selective channels. 


4 Conclusions 

In this study, we proposed a novel function-oriented approach to the single model quality assessment which 
is complementary to existing methods. The approach is applicable to analysis of structural models of proteins 
whose quantitative functional characteristics are known. This general idea was applied to quality assessment 
of structural models of potassium channel KcsA generated from contact maps of varying quality. The 
evaluation was based on current-voltage characteristics computed for predicted structures using the classical 
3D Poisson-Nernst-Planck model, which were compared to available results from patch-clamp experiments. 

We found that structural quality of candidate models, in terms of RMSD to the original structure, 
was significantly correlated with predicted conductance and charge selectivity (Kendall’s rank correlation 
up to 0.4). This supported the initial hypothesis that predicted structural models could be validated on 
the basis of their calculated functional features related to experimental data. It was further confirmed by 
good performance in separating models with low and high RMSD on the basis on deviation of current and 
selectivity from their values computed for the true structure {AUROC up to 0.78). 

In practical terms, our approach had to deal with limitations of the classical 3D PNP, which is a fast 
but approximate method and could not accurately reproduce experimental characteristics for the reference 
structure. Therefore, cutoff thresholds for assessing functional correctness of a model had to be set liberally. 
Under these conditions, we showed that evaluating predicted conductance was an appropriate method to 
eliminate modes with an occluded pore or with multiple erroneously created pores. In addition, filtering 
models on the basis of their predicted charge selectivity resulted in a substantial enrichment of the candidate 
set in highly accurate models. E.g. by demanding the charging selectivity above 10:1, we obtained a high 
accuracy subset containing 21% candidate models of which 99% had Ca-Cp RMSD below 3 A. This shows 
that the method can be directly applied for evaluation of structural models of at least strongly-selective 
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protein channels. Moreover, it can be expected that efficiency of our model quality assessment method will 
improve when more accurate and comparably fast continuous models of ion flow in a protein channel are 
available. 

Our work raises an important question how to define correctness of an ion channel model. Is the general 
RMSD an appropriate ground truth measure in this context? We found that a significant proportion of 
models were occluded while having a low general RMSD. It is unlikely that this could be uniquely attributed 
to the coarse resolution and discretization used in the PNP calculations. In addition, an important variation 
of electrostatic profiles was found in a group of models characterized by Ca-C^g RMSD below 1 A. Therefore, 
we investigated using the electrostatic potential profile of the reference structure as a complementary ground 
truth. Not surprisingly, models with low and high RMSE of the electrostatic profile were well separated 
on the basis of deviation of current and selectivity {AUROC up to 0.76). Very interestingly, the selection 
of 100 best models in terms of the selectivity was significantly more enriched in TOPlOO models with the 
the joint lowest RMSE and RMSD than in models with the lowest RMSD. While the electrostatic profile 
cannot be measured experimentally, our results indicate that predicted current-voltage characteristics convey 
information about electrostatics. This important information about correctness of a model is complementary 
to the general RMSD. This suggests that, perhaps, the computational validation of functionality should be 
included in the evaluation process of structural models whenever possible. 
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Figures captions 


Figure 1: Kendall’s rank correlations of amino acid RMSD and deviations of functional fea¬ 
tures in models reconstructed from full contact maps. (A) All significant correlations (p- 
value<0.01) between an amino acid RMSD and deviations of at least 2 (blue) or 1 (cyan) functional 
features. (B) The strongest correlation was observed between the pore diameter at THR75 (orange) 
and rectification. Other strong correlations included the RMSD of THR75 and deviation of rectifica¬ 
tion; and the RMSD of PR083 (pink) and deviation of the inward anionic current. 

Figure 2: Structural and functional quality of reconstructed KcsA models, (a) Structural Ca- 
C/3 (main) and full atom (inset) RMSD of predicted KcsA structures in subsets built using various 
percentages of contact maps, (b-d) Functional characteristics of predicted KcsA structures in lOOmM 
KCl at ±100 mV, only the conducting models were considered, (b) total outward and inward currents, 
(c) outward and inward cation to anion selectivity, and (d) rectification (outward to inward current 
ratio). Notations: whiskers - min and max, box edges - 25% and 75% percentile, inner line - median, 
dotted line indicates value calculated for the reference structure. 

Figure 3: Most significant Kendall’s rank correlations of amino acid RMSD and deviations 
of functional features in all models. Notations: black - LEU40, gray - PR063 (outside of the 
protein) and PR083 (middle of the protein), white - GLY79 (extracellular entrance to the SF), GLY88 
(outside of the channel) and GLY104 (intracellular entrance to the channel) 

Figure 4: ROC curves of model classification based on deviations of current and selectivity at 
four thresholds of the general G^-C^g RMSD. Only the conducting models were considered. The 
RMSD thresholds corresponded to the following positive/negative ratios: 1 A: 752/935, 2 A: 1229/458, 
3 A: 1412/275, 4 A: 1499/188. 

Figure 5: Electrostatic profile RMSE as a complementary ground truth, (a) Exemplary electro¬ 
static profiles of the reference channel pore (solid) and two modeled channel pores: correct (dashed 
line) and incorrect (dash-dotted line), (b) Scatter plot of the electrostatic profile RMSE versus the 
general Ca-Cp RMSD. The axes have been cut at the 10 A and IV thresholds of RMSD and RMSE, 
respectively. Both measures are overall well correlated (Kendall’s r = 0.45), the relation is much 
weaker for low RMSD structures (r = 0.12 for RMSD < 1.7 A). 

Figure 6: ROC curves of model classification based on deviations of current and selectivity 
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at three threshold of the electrostatic profile RMSE. Only the conducting models were consid¬ 
ered. The RMSE thresholds corresponded to the following positive/negative ratios: 0.3 V: 625/1049, 
0.4 V: 1025/649, 0.5 V: 1324/350. The AUROC was in the following ranges: (a) 0.61-0.72, (b) 0.56-0.66, 
(c) 0.64-0.74, (d) 0.66-0.76. 

Eigure 7: Quality enrichment of the candidate subsets using several functional criteria, (a) 

Box plots of the structural Cq-C^s RSMD and the electrostatic profile RMSE in groups of models 
fulfilling functional conditions: cond\ the conducting models or lpS<G<l nS, CIO: G>10 pS, SIO: 
G'^IG~ > 10 : 1, S50: G^ jG~ > 50 : 1, RO: GoutIGin > 1. Notations: red line - median, box edges - 
25*^ and 75*^ percentile, whiskers - min and max. (b) Scatter plot of the electrostatic profile RMSE 
versus the general Ca-Gp RMSD. The axes have been cut at the 10 A and IV thresholds of RMSD and 
RMSE, respectively. Color code: green - the conducting models with inward and outward selectivities 
G^IG~ > 10 : 1, blue - remaining conducting models (lpS<G<l nS), black - all other models. 
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Tables 


Parameter 

Experimental 

Computational 

Inward (-100 mV) 

Total conductance 

57 pS 

15 pS 

Cation/anion selectivity itn/^Tn 

oo:l 

181:1 

Outward (-hi00 mV) 

Total conductance lout 

75 pS 

21 pS 

Cation/anion selectivity l)n/I(n 

oo:l 

111:1 

Rectification \Iout/Iin\ 

1.29 

1.39 


Table 1: Selected experimental |32] and computational [30) parameters of KcsA I-V curves. 

Computational results obtained using 3D PNP Solver on the 3FB8 structure. 
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CMAP 

Conducting 

models 

Conductance 

G >10 pS 

Rectification 

|-^ 04 ii| ^ \^in\ 

Selectivity 

^ > 10 ^ > 50 

All criteria inch 

> 10 > 50 

100% 

343 

265 (77%) 

140 (41%) 

193 (56%) 

101 (29%) 

67 (20%) 

38 (11%) 

90% 

354 

229 (65%) 

182 (51%) 

120 (34%) 

50 (14%) 

43 (12%) 

18 (5%) 

70% 

361 

245 (68%) 

174 (48%) 

98 (27%) 

24 (7%) 

32 (9%) 

10 (3%) 

50% 

379 

277 (73%) 

165 (43%) 

49 (13%) 

19 (5%) 

12 (3%) 

5 (1.3%) 

30% 

250 

203 (81%) 

117 (47%) 

2 (.6%) 

- 

- 

- 

TOTAL 

1687 

1219 (72%) 

778 (46%) 

462 (26%) 

194 (11%) 

154 (9%) 

71 (4%) 


Table 2: Functional quality assessment of models based on randomly reduced contact maps. 

The table accounts only for conducting models (lpS<G<lnS). The CMAP column indicates completeness 
of randomly reduced contact maps. 
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Condition 

RMSD Ca-C^ 

Sp-Sn 

Sp 

Sn 

ACC 

MCC 

AUROC 

Alin < 0.80 pA 

< 1 A 

0.37 

0.52 

0.71 

0.60 

0.23 

0.65 

Ahn < 1.15 pA 

< 3 A 

0.48 

0.59 

0.82 

0.78 

0.34 

0.75 

Alout < 1.00 PA 

< 1 A 

0.32 

0.44 

0.71 

0.56 

0.16 

0.59 

Alout < 1.25 pA 

< 3 A 

0.44 

0.55 

0.80 

0.76 

0.29 

0.70 

A^ < 3.90 

<1 A 

0.48 

0.69 

0.69 

0.69 

0.38 

0.72 

< 4.40 

< 3 A 

0.54 

0.75 

0.72 

0.72 

0.36 

0.76 

< 3.35 

<1 A 

0.48 

0.67 

0.73 

0.69 

0.39 

0.74 

< 3.60 

< 3 A 

0.56 

0.85 

0.67 

0.70 

0.38 

0.78 


out 


Table 3: Optimal classification based on selectivity deviation and current deviation related to 
C„-C;3 RMSD as the ground truth.. Only the conducting models are considered. The RMSD thresholds 
corresponded to the following Positive/Negative ratios: 1 A: 752/935, 3 A: 1412/275. 
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Table 4: Quality enrichment of the candidate snbsets using several functional criteria. Enrichment 
denotes a fraction of models fulfilling a functional condition which are within a given structural or electrostatic 
threshold. Functional parameters were calculated only for the conducting models, (top) The functional 
conditions were defined as follows: cond: the conducting models or lpS<G<l nS, CIO: G>10 pS, SIO: 
G+/G- > 10 : 1, S50: G+/G- > 50 : 1, RO: Gout!Gin > 1- (bottom) TOPlOO denotes the best 100 
models fulfilling all conditions given as an argument. Here, the enrichment is effectively a fraction of models 
which belong to the cross-section of a pair of TOPlOO rankings. Note that models 98th to 107th in the 
Cq-C /3 RMSD-based ranking had exactly the same quality (0.749 A). 


Condition 

^models 

median 

median 


Enrichment [%] 

RMSD 

RMSE 

RMSD 

RMSD 

RMSE RMSE 





<iA 

< 3A 

<0.3V <0.5V 

none 

2158 

1.14 

0.35 

44 

82 

38 79 

cond 

1674 

1.12 

0.35 

45 

84 

37 79 

CIO 

1213 

1.15 

0.34 

44 

82 

38 78 

SIO 

458 

0.80 

0.28 

75 

99 

57 98 

S50 

191 

0.78 

0.26 

82 

100 

72 100 

SIO & RO 

203 

0.84 

0.32 

75 

100 

41 98 

S50 & RO 

71 

0.78 

0.29 

75 

100 

61 100 





TOPlOO 

TOPlOO 

TOPlOO 





[RMSD] 

[RMSE] 

[RMSD&RMSE] 

none 

2158 

1.14 

0.35 

5.0 

4.6 

4.6 

TOP100[^&^] 

in ^out 

100 

0.78 

0.25 

13 

16 

22 

TOP100[p^] 

100 

0.79 

0.26 

9.4 

16 

19 

TOP100[^] 

100 

0.78 

0.24 

18 

19 

33 
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Figure 1: Kendall’s rank correlations of amino acid RMSD and deviations of functional features in models 
reconstructed from full contact maps. (A) All significant correlations (p-value<0.01) between an aminacid 
RMSD and deviations of at least 2 (blue) or 1 (cyan) functional features. (B) The strongest correlation was 
observed between the pore diameter at THR75 (orange) and rectification. Other strong correlations included 
the RMSD of THR75 and deviation of rectification; and the RMSD of PR083 (pink) and deviation of the 
inward anionic current. 
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Figure 2: Structural and functional quality of reconstructed KcsA models, (a) Structural Ca-C^ (main) 
and full atom (inset) RMSD of predicted KcsA structures in subsets built using various percentages of 
contact maps, (b-d) Functional characteristics of predicted KcsA structures in lOOmM KCl at ±100 mV, 
only the conducting models were considered, (b) total outward and inward currents, (c) outward and inward 
cation to anion selectivity, and (d) rectification (outward to inward current ratio). Notations: whiskers - 
min and max, box edges - 25% and 75% percentile, inner line - median, dotted line indicates value calculated 
for the reference structure. 



Figure 3: Most significant Kendall’s rank correlations of amino acid RMSD and deviations of functional 
features in all models. Notations: black - LEU40, gray - PR063 (outside of the protein) and PR083 (middle 
of the protein), white - GLY79 (extracellular entrance to the SF), GLY88 (outside of the channel) and 
GLY104 (intracellular entrance to the channel) 
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Figure 4: ROC curves of model classification based on deviations of current and selectivity at four thresholds 
of the general Cq,-C^ RMSD. Only the conducting models were considered. The RMSD thresholds corre¬ 
sponded to the following positive/negative ratios: 1 A: 752/935, 2 A: 1229/458, 3 A: 1412/275, 4 A: 1499/188. 



Figure 5: Electrostatic profile RMSE as a complementary ground truth, (a) Exemplary electrostatic profiles 
of the reference channel pore (solid) and two modeled channel pores: correct (dashed line) and incorrect 
(dash-dotted line), (b) Scatter plot of the electrostatic profile RMSE versus the general Ca-Cp RMSD. The 
axes have been cut at the 10 A and 1V thresholds of RMSD and RMSE, respectively. Both measures are 
overally well correlated (Kendall’s r = 0.45), the relation is much weaker for low RMSD structures (t = 0.12 
for RMSD <1.7 A). 
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Figure 6: ROC curves of model classification based on deviations of current and selectivity at three threshold 
of the electrostatic profile RMSE. Only the conducting models were considered. The RMSE thresholds 
corresponded to the following positive/negative ratios: 0.3 V: 625/1049, 0.4 V: 1025/649, 0.5 V: 1324/350. 
The AUROC was in the following ranges: (a) 0.61-0.72, (b) 0.56-0.66, (c) 0.64-0.74, (d) 0.66-0.76. 
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Figure 7: Quality enrichment of the candidate subsets using several functional criteria, (a) Boxplots of 
the structural Ca-Gp RSMD and the electrostatic profile RMSE in groups of models fulhlling functional 
conditions: cond: the conducting models or lpS<G'<l nS, CIO: G>10 pS, SIO: G'^jG~ > 10 : 1, S50: 
G^IG~ > 50 : 1, RO: GoutjGin > 1- Notations: whiskers - min and max, box edges - 25% and 75% 
percentile, inner line - median, (b) Scatter plot of the electrostatic profile RMSE versus the general Ca-Cp 
RMSD. The axes have been cut at the 10 A and 1 V thresholds of RMSD and RMSE, respectively. Notations: 
magenta - the conducting models with inward and outward selectivities G'^ IG~ > 10 : 1, black - all other 
models. 
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Supplemental Data 

Supplemental Table 1 


Structural feature 

Functional feature 

r 

p-value 

Pore diameter at THR75 

1 ^out / ^in 1 

0.23 - 0.24 

2.4 X 10-^b 5 5 X 10-11 

RMSD of THR75 


0.14 - 0.21 

6.5 X 10-5 - 2.9 X 10-9 

RMSD of PR083 


0.20 

2.3 X 10-® 

General full atom RMSD 

A/- 

0.19 

1.9 X 10-^ 


Most significant Kendall correlations between structural and functional features based on the full contact 
map set. Note that side chains of reconstructed tetramers of KcsA were not perfectly symmetric which in 
case of THR75 resulted in a range of r. 
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Supplemental Table 2 


Contact map completeness 

Number of models 

Full atom RMSD<4 A 

Median RMSD [A] 

Ca-C;3 

Full atom 

100% 

430 

430 (100%) 

0.76 

2.39 

90% 

460 

460 (100%) 

0.85 

2.42 

70% 

460 

460 (100%) 

1.16 

2.56 

50% 

465 

412 (89%) 

2.04 

3.30 

30% 

361 

5 (1.4%) 

5.95 

6.87 

30-100% 

2176 

1767 (81%) 

1.14 

2.61 


Structural quality assessment of models based on randomly reduced contact maps. 
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Supplemental Table 3 


Structural 


Functional feature 


RMSD 

|A/„| 

|A§| 

\AIout\ |A^| 

|A|^|| 

general Cq-C /3 

0.23 

0.32 

0.15 0.34 

6.01 

general full atom 

0.24 

0.31 

0.13 0.32 

0.02 

LEU40 

PR063 

GLY79 

PR083 

GLY88 

GLY104 

0.21-0.23 

0.28-0.30 

0.12-0.15 0.30-0.31 

-0.01-0.03 

OgLY79 

Oaspso 

0.22-0.23 

0.29 

0.12-0.14 0.31 

0.01-0.02 


Structural features whose Kendall’s correlation with at least one functional feature was above 0.30. 
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Supplemental Table 4 


Condition 

RMSE 

Sp ■ Sn 

Sp 

Sn 

ACC 

MCC 

AUROC 

|A^| < 3.90 

< 0.3 V 

0.43 

0.72 

0.60 

0.65 

0.31 

0.70 

< 4.15 

mi t 

< 0.5 V 

0.51 

0.78 

0.65 

0.68 

0.35 

0.75 

|A^| < 2.80 

< 0.3 V 

0.44 

0.83 

0.53 

0.65 

0.36 

0.71 

lAp^l < 3.60 

< 0.5 V 

0.51 

0.76 

0.67 

0.69 

0.36 

0.76 

A/o„t < 0.75 pA 

< 0.3 V 

0.38 

0.59 

0.65 

0.63 

0.23 

0.67 

\AIout\ < 1.00pA 

< 0.5 V 

0.44 

0.59 

0.75 

0.72 

0.30 

0.72 

|AA„| < 0.90 pA 

< 0.3 V 

0.35 

0.55 

0.64 

0.61 

0.19 

0.61 

\AIin\ < 0.90 pA 

< 0.5 V 

0.38 

0.62 

0.62 

0.62 

0.19 

0.66 


Optimal classification parameters based on selectivity deviation and current deviation related to electrostatic 
RMSE as the ground truth. 


34 



Supplemental Figure 1 



Scatter plot of general RMSD vs. absolute deviation of outward selectivity 


35 





































Inward anionic currentabs. dev 


Supplemental Figure 2 
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Scatter plot of general C^-C^ RMSD vs. 
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absolute deviation of anionic current 
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